Alternative splice site recognition based on a new fuzzy support vector machine.
نویسندگان
چکیده
Accurate alternative splice site (ASS) recognition is an important and difficult topic in the gene identification, and the average recognition rate is still ,85% [1]. Many statistical pattern recognition methods, such as neural networks (NNs) and support vector machine (SVM), were used for this task [2,3]. Among them, SVM can construct a good highdimensional learning model in the case of limited training set size and has good generalization ability, which exhibits many unique advantages in solving the small sample, non-linear, and high-dimensional pattern recognition problems [4]. To reduce the impact of noise samples on constructing optimal hyperplane, fuzzy SVM (FSVM) method was proposed [5]. Each sample was assigned a different membership and had different contributions to the objective function. Because the noise samples had smaller memberships, and their effects on the separating hyperplane were reduced or eliminated. The fuzzy membership function (FMF) design is critical for FSVM [6,7]. A good FMF should be able to assign support vectors higher membership while noise samples lower membership. FMF is generally constructed by the distances between samples and class centers [8], tightness defined by mixed kernel function [9], or tightness defined by mix kernel function in feature space [10]. These methods can reduce to some extent the impact of the noise samples, but also reduce the memberships of support vectors. Here we designed a new membership calculation method that can simultaneously reduce the noise sample memberships and increase the support vector memberships. Given a fuzzy training samples set: S 1⁄4 fðx1; y1; s1Þ; ðx2; y2; s2Þ; ...; ðxn; yn; snÞg, where xi is sample vector, yi [ f 1; 1g is the sample category label, 0 si 1 is the sample fuzzy membership that reflects the importance of xi. The FSVM decision function is
منابع مشابه
Robustified distance based fuzzy membership function for support vector machine classification
Fuzzification of support vector machine has been utilized to deal with outlier and noise problem. This importance is achieved, by the means of fuzzy membership function, which is generally built based on the distance of the points to the class centroid. The focus of this research is twofold. Firstly, by taking the advantage of robust statistics in the fuzzy SVM, more emphasis on reducing the im...
متن کاملSupport vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملAutomatic Face Recognition via Local Directional Patterns
Automatic facial recognition has many potential applications in different areas of humancomputer interaction. However, they are not yet fully realized due to the lack of an effectivefacial feature descriptor. In this paper, we present a new appearance based feature descriptor,the local directional pattern (LDP), to represent facial geometry and analyze its performance inrecognition. An LDP feat...
متن کاملPrototype based recognition of splice sites
Splice site recognition is an important subproblem of de novo gene finding, splice junctions constituting the boundary between coding and non-coding regions in eukaryotic DNA. The availability of large amounts of sequenced DNA makes the development of fast and reliable tools for automatic identification of important functional regions of DNA necessary. We present a prototype based pattern recog...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Acta biochimica et biophysica Sinica
دوره 45 5 شماره
صفحات -
تاریخ انتشار 2013